Dataset statistics
| Number of variables | 7 |
|---|---|
| Number of observations | 4000 |
| Missing cells | 2223 |
| Missing cells (%) | 7.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 218.9 KiB |
| Average record size in memory | 56.0 B |
Variable types
| NUM | 4 |
|---|---|
| CAT | 2 |
| BOOL | 1 |
Height has 1894 (47.4%) missing values | Missing |
Weight has 326 (8.2%) missing values | Missing |
PATIENT_ID has unique values | Unique |
Reproduction
| Analysis started | 2020-09-17 13:46:38.843291 |
|---|---|
| Analysis finished | 2020-09-17 13:46:48.004032 |
| Duration | 9.16 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 4000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 137605.122 |
|---|---|
| Minimum | 132539 |
| Maximum | 142673 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 31.2 KiB |
Quantile statistics
| Minimum | 132539 |
|---|---|
| 5-th percentile | 133038.95 |
| Q1 | 135075.75 |
| median | 137592.5 |
| Q3 | 140100.25 |
| 95-th percentile | 142176.2 |
| Maximum | 142673 |
| Range | 10134 |
| Interquartile range (IQR) | 5024.5 |
Descriptive statistics
| Standard deviation | 2923.608886 |
|---|---|
| Coefficient of variation (CV) | 0.02124636673 |
| Kurtosis | -1.191489871 |
| Mean | 137605.122 |
| Median Absolute Deviation (MAD) | 2513 |
| Skewness | 0.00584744361 |
| Sum | 550420488 |
| Variance | 8547488.92 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 135162 | 1 | < 0.1% | |
| 139889 | 1 | < 0.1% | |
| 142205 | 1 | < 0.1% | |
| 141960 | 1 | < 0.1% | |
| 133764 | 1 | < 0.1% | |
| 135811 | 1 | < 0.1% | |
| 139905 | 1 | < 0.1% | |
| 142106 | 1 | < 0.1% | |
| 135803 | 1 | < 0.1% | |
| 137850 | 1 | < 0.1% | |
| Other values (3990) | 3990 | 99.8% |
| Value | Count | Frequency (%) | |
| 132539 | 1 | < 0.1% | |
| 132540 | 1 | < 0.1% | |
| 132541 | 1 | < 0.1% | |
| 132543 | 1 | < 0.1% | |
| 132545 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 142673 | 1 | < 0.1% | |
| 142671 | 1 | < 0.1% | |
| 142670 | 1 | < 0.1% | |
| 142667 | 1 | < 0.1% | |
| 142665 | 1 | < 0.1% |
ihd
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.2 KiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 3446 | 86.2% | |
| 1 | 554 | 13.9% |
Age
Real number (ℝ≥0)
| Distinct | 76 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 64.2475 |
|---|---|
| Minimum | 15 |
| Maximum | 90 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 31.2 KiB |
Quantile statistics
| Minimum | 15 |
|---|---|
| 5-th percentile | 29 |
| Q1 | 52.75 |
| median | 67 |
| Q3 | 78 |
| 95-th percentile | 89 |
| Maximum | 90 |
| Range | 75 |
| Interquartile range (IQR) | 25.25 |
Descriptive statistics
| Standard deviation | 17.56094646 |
|---|---|
| Coefficient of variation (CV) | 0.2733327594 |
| Kurtosis | -0.3325881438 |
| Mean | 64.2475 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | -0.6062680114 |
| Sum | 256990 |
| Variance | 308.3868405 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 90 | 164 | 4.1% | |
| 77 | 126 | 3.1% | |
| 78 | 104 | 2.6% | |
| 83 | 102 | 2.5% | |
| 79 | 100 | 2.5% | |
| 74 | 93 | 2.3% | |
| 72 | 90 | 2.2% | |
| 80 | 89 | 2.2% | |
| 73 | 88 | 2.2% | |
| 81 | 86 | 2.1% | |
| Other values (66) | 2958 | 74.0% |
| Value | Count | Frequency (%) | |
| 15 | 1 | < 0.1% | |
| 16 | 1 | < 0.1% | |
| 17 | 4 | 0.1% | |
| 18 | 12 | 0.3% | |
| 19 | 18 | 0.4% |
| Value | Count | Frequency (%) | |
| 90 | 164 | 4.1% | |
| 89 | 41 | 1.0% | |
| 88 | 51 | 1.3% | |
| 87 | 49 | 1.2% | |
| 86 | 65 | 1.6% |
Gender
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 3 |
| Missing (%) | 0.1% |
| Memory size | 31.2 KiB |
| male | |
|---|---|
| female |
| Value | Count | Frequency (%) | |
| male | 2246 | 56.1% | |
| female | 1751 | 43.8% | |
| (Missing) | 3 | 0.1% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.87475 |
| Min length | 3 |
| Distinct | 70 |
|---|---|
| Distinct (%) | 3.3% |
| Missing | 1894 |
| Missing (%) | 47.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 169.787227 |
|---|---|
| Minimum | 1.8 |
| Maximum | 431.8 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 31.2 KiB |
Quantile statistics
| Minimum | 1.8 |
|---|---|
| 5-th percentile | 152.4 |
| Q1 | 162.6 |
| median | 170.2 |
| Q3 | 177.8 |
| 95-th percentile | 185.4 |
| Maximum | 431.8 |
| Range | 430 |
| Interquartile range (IQR) | 15.2 |
Descriptive statistics
| Standard deviation | 20.17460395 |
|---|---|
| Coefficient of variation (CV) | 0.1188228603 |
| Kurtosis | 77.10913097 |
| Mean | 169.787227 |
| Median Absolute Deviation (MAD) | 7.6 |
| Skewness | 3.303175323 |
| Sum | 357571.9 |
| Variance | 407.0146444 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 177.8 | 213 | 5.3% | |
| 182.9 | 188 | 4.7% | |
| 170.2 | 173 | 4.3% | |
| 172.7 | 169 | 4.2% | |
| 167.6 | 162 | 4.0% | |
| 162.6 | 153 | 3.8% | |
| 157.5 | 142 | 3.5% | |
| 175.3 | 137 | 3.4% | |
| 165.1 | 124 | 3.1% | |
| 180.3 | 115 | 2.9% | |
| Other values (60) | 530 | 13.2% | |
| (Missing) | 1894 | 47.3% |
| Value | Count | Frequency (%) | |
| 1.8 | 1 | < 0.1% | |
| 13 | 3 | 0.1% | |
| 13.7 | 1 | < 0.1% | |
| 14 | 2 | 0.1% | |
| 15.2 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 431.8 | 1 | < 0.1% | |
| 426.7 | 1 | < 0.1% | |
| 419.1 | 1 | < 0.1% | |
| 406.4 | 1 | < 0.1% | |
| 398.8 | 1 | < 0.1% |
ICUType
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.2 KiB |
| Medical ICU | |
|---|---|
| Surgical ICU | |
| Cardiac Surgery Recovery Unit | |
| Coronary Care Unit |
| Value | Count | Frequency (%) | |
| Medical ICU | 1481 | 37.0% | |
| Surgical ICU | 1068 | 26.7% | |
| Cardiac Surgery Recovery Unit | 874 | 21.9% | |
| Coronary Care Unit | 577 | 14.4% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 29 |
|---|---|
| Median length | 12 |
| Mean length | 16.20975 |
| Min length | 11 |
| Distinct | 836 |
|---|---|
| Distinct (%) | 22.8% |
| Missing | 326 |
| Missing (%) | 8.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 81.47826892 |
|---|---|
| Minimum | 21.7 |
| Maximum | 300 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 31.2 KiB |
Quantile statistics
| Minimum | 21.7 |
|---|---|
| 5-th percentile | 50.5 |
| Q1 | 66 |
| median | 78.6 |
| Q3 | 92 |
| 95-th percentile | 122 |
| Maximum | 300 |
| Range | 278.3 |
| Interquartile range (IQR) | 26 |
Descriptive statistics
| Standard deviation | 23.62843246 |
|---|---|
| Coefficient of variation (CV) | 0.2899967411 |
| Kurtosis | 7.409920777 |
| Mean | 81.47826892 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 1.704291978 |
| Sum | 299351.16 |
| Variance | 558.3028205 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 70 | 98 | 2.5% | |
| 80 | 79 | 2.0% | |
| 90 | 68 | 1.7% | |
| 60 | 61 | 1.5% | |
| 65 | 56 | 1.4% | |
| 75 | 54 | 1.4% | |
| 100 | 53 | 1.3% | |
| 85 | 43 | 1.1% | |
| 77 | 32 | 0.8% | |
| 82 | 31 | 0.8% | |
| Other values (826) | 3099 | 77.5% | |
| (Missing) | 326 | 8.2% |
| Value | Count | Frequency (%) | |
| 21.7 | 1 | < 0.1% | |
| 31.7 | 1 | < 0.1% | |
| 32 | 1 | < 0.1% | |
| 34.6 | 1 | < 0.1% | |
| 35 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 300 | 1 | < 0.1% | |
| 280 | 1 | < 0.1% | |
| 253 | 1 | < 0.1% | |
| 230 | 1 | < 0.1% | |
| 220 | 3 | 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| PATIENT_ID | ihd | Age | Gender | Height | ICUType | Weight | |
|---|---|---|---|---|---|---|---|
| 0 | 132539 | 0 | 54 | female | NaN | Surgical ICU | NaN |
| 1 | 132540 | 0 | 76 | male | 175.3 | Cardiac Surgery Recovery Unit | 76.0 |
| 2 | 132541 | 0 | 44 | female | NaN | Medical ICU | 56.7 |
| 3 | 132543 | 0 | 68 | male | 180.3 | Medical ICU | 84.6 |
| 4 | 132545 | 0 | 88 | female | NaN | Medical ICU | NaN |
| 5 | 132547 | 0 | 64 | male | 180.3 | Coronary Care Unit | 114.0 |
| 6 | 132548 | 0 | 68 | female | 162.6 | Medical ICU | 87.0 |
| 7 | 132551 | 1 | 78 | female | 162.6 | Medical ICU | 48.4 |
| 8 | 132554 | 0 | 64 | female | NaN | Medical ICU | 60.7 |
| 9 | 132555 | 0 | 74 | male | 175.3 | Cardiac Surgery Recovery Unit | 66.1 |
Last rows
| PATIENT_ID | ihd | Age | Gender | Height | ICUType | Weight | |
|---|---|---|---|---|---|---|---|
| 3990 | 142655 | 0 | 43 | male | NaN | Medical ICU | 92.9 |
| 3991 | 142659 | 0 | 88 | male | NaN | Coronary Care Unit | 90.7 |
| 3992 | 142661 | 0 | 89 | male | 177.8 | Surgical ICU | 64.0 |
| 3993 | 142662 | 0 | 86 | male | 162.6 | Medical ICU | 53.0 |
| 3994 | 142664 | 0 | 51 | female | NaN | Surgical ICU | 75.0 |
| 3995 | 142665 | 0 | 70 | female | NaN | Surgical ICU | 87.0 |
| 3996 | 142667 | 0 | 25 | male | NaN | Medical ICU | 166.4 |
| 3997 | 142670 | 0 | 44 | male | NaN | Medical ICU | 109.0 |
| 3998 | 142671 | 1 | 37 | male | NaN | Medical ICU | 87.4 |
| 3999 | 142673 | 0 | 78 | female | 157.5 | Surgical ICU | 70.7 |